Extract, Transform, and Load (ETL) process in Power BI
The video above will outline the full ETL process for this project. For full end-to-end steps of this project, please watch the video. In the content below, I will outline the aims, reasoning and some of my thought processes behind the project.
Why undertake this project?
The ETL process is a key part of any business when accessing their data. The ETL process can vary greatly in complexity, so a basic understanding of this process is a necessity for any data analyst.
This project aims to demonstrate an example of the ETL process in Power BI, and the steps required along the way.
Note: Please be aware, this project is intended to act as an example only. In this particular instance, I am aware the same outcome could be achieved utilising data modelling. This is something I will explore in a future project.
The Task
For this project, the example business has a sales dataset in CSV format, an Excel workbook containing a list of their products, and another Excel workbook contain country population data that had been scrapped from a webpage.
The business desires to perform data analysis on this data so that the insights can be presented to relevant stakeholders. In their current format, data analysis would be difficult to achieve as the datasets are not currently linked. You can picture this as each dataset being on its own island, separated from the rest. To try to analyse the data now would be very manual and time-consuming. Also, because the datasets are not linked, you would be somewhat limited as to what insights the analysis could achieve. In fact, there is a high likelihood that any insights pulled from the data at this stage would be potentially misleading.
So how do we achieve to goal of extracting the data we need, and manipulating it so that it is ready for further analysis and visualisation?
This is where the ETL process comes in.
ETL is an abbreviation for "Extract, Transform and Load". The Extract step of the process is quite self-explanatory, you extract the desired data from its source. Once you have extracted the data, this is when you start the Transform step. Transforming the data can be quite broad in terms of what you do to the data. In simple terms, this is where you manipulate the data so that its structure and form enable you to complete your end goal, in this example, this is for data analysis and visualisation.
At this stage, you're now ready to Load the data, the final step of this process. In this project, as the goal is to analyse and visualise the data, we will be loading the data into Power BI. In fact, Power BI is a powerful tool and we have actually performed the entire ETL process in Power BI for this project!
At this stage, the ETL process will be complete and the data will be ready for further analysis and visualisation!